Show
pacman::p_load(tidyverse, plotly, crosstalk, DT, ggdist, gganimate, ggstatsplot, heatmaply)Aruiana
February 5, 2023
February 15, 2023
To uncover the salient patterns of the resale prices of public housing property by residential towns and estates in Singapore by using appropriate analytical visualisation techniques l
For the purpose of this study, the focus in on 3-ROOM, 4-ROOM and 5-ROOM types in 2022.
Rows: 146338 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): month, town, flat_type, block, street_name, storey_range, flat_mode...
dbl (3): floor_area_sqm, lease_commence_date, resale_price
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Filter out the data required:
1. Room Type
2. Year 2022
#Filter 3Room, 4Room, 5Room, Filter 2022, Convert remaining lease into years
HDBRoom <- HDB %>% filter(flat_type=="3 ROOM" | flat_type=="4 ROOM" | flat_type=="5 ROOM") %>%
separate(month, into = c("year", "month")) %>%
filter(year == "2022") %>%
separate(remaining_lease, into = c("rmlease_years", "rmlease_month"), sep = "years") 1. Convert the Month from Character to Number
2. Convert Remaining lease from Character to Number
3. Re-categorise towns into regions
4. Sort Storey Range by smallest to largest
5. Create new dataset for price/sqm
#Convert Month from Chr to number
HDBRoom$month <- as.numeric(HDBRoom$month)
#Convert Remaining lease into numeric years in decimal
HDBRoom$rmlease_years <- as.numeric(HDBRoom$rmlease_years)
HDBRoom$rmlease_month <- gsub("[monthsmonth]", " ", HDBRoom$rmlease_month) %>%
as.numeric(HDBRoom$rmlease_month) / 12
HDBRoom$rmlease_month[is.na(HDBRoom$rmlease_month)] = 0
HDBRoom$rmlease <- as.numeric(HDBRoom$rmlease_years + HDBRoom$rmlease_month)
#Group Towns into Regions
HDBRoom$region <- case_when(
HDBRoom$town %in% c("ANG MO KIO", "HOUGANG", "PUNGGOL", "SERANGOON", "SENGKANG") ~ "North-East",
HDBRoom$town %in% c("BISHAN", "BUKIT MERAH", "BUKIT TIMAH", "CENTRAL AREA", "GEYLANG", "KALLANG/WHAMPOA", "MARINE PARADE", "QUEENSTOWN", "TOA PAYOH") ~ "Central",
HDBRoom$town %in% c("BEDOK", "PASIR RIS", "TAMPINES") ~ "East",
HDBRoom$town %in% c("SEMBAWANG", "WOODLANDS", "YISHUN") ~ "North",
HDBRoom$town %in% c("BUKIT BATOK", "BUKIT PANJANG", "CHOA CHU KANG", "CLEMENTI", "JURONG EAST", "JURONG WEST") ~ "West")
#Edit storey range and sort by smallest
HDBRoom$storey_range <- str_replace(HDBRoom$storey_range, "TO", "-")
sr_sort = c("01 - 03", "04 - 06", "07 - 09", "10 - 12", "13 - 15", "16 - 18", "19 - 21", "22 - 24","25 - 27","28 - 30", "31 - 33", "34 - 36", "37 - 39", "40 - 42", "43 - 45", "49 - 51", "46 - 48")
HDBRoom$storey_range <- factor (HDBRoom$storey_range, levels = sr_sort)
#Create additional data on price per sqm
HDBRoom$price_per_sqm <- (HDBRoom$resale_price / HDBRoom$floor_area_sqm)options(scipen = 999)
p1 <- gghistostats(
data = HDBDATA, x = "rmlease",
type = "bayes",
test.value = 100,
xlab = "Resale Property remaining lease"
)
p2 <- gghistostats(
data = HDBDATA, x = "month",
type = "bayes",
test.value = 100,
xlab = "Month of Purchase"
)
p3 <- gghistostats(
data = HDBDATA, x = "resale_price",
type = "bayes",
test.value = 100,
xlab = "Resale Price"
)
p4 <- gghistostats(
data = HDBDATA, x = "price_per_sqm",
type = "bayes",
test.value = 100,
xlab = "Resale Price/sqm"
)
p5 <- ggplot(
data = HDBDATA, aes(x = town, y=rmlease, colour = flat_type)) + geom_point() + theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
labs( x = "Resale by Town")
p6 <- ggplot(
data = HDBDATA, aes(x = storey_range, fill = flat_type)) + geom_bar() + theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
labs( x = "Resale by Storey")
(p3 + p4) / (p1 + p2)


From the following histograms, we have the following findings:

Using a violin plot, we compare the $/sqm by flat type. Thus, it can be see that the prices is driven by demand that the median price for a 3-room flat can be higher than a 5-room flat.
Thus, the prices of the flat may have a higher correlation to location (accessibility), demand for flats, affordability. 5-room flats which cost higher due to the area range may have a lower demand.
It can be noted that the Singapore housing rates are rather stable with all 3 flat types have very close median of only $100-$200 difference.
We can see that for the North, North-East and West, the histogram is rather uniformed. However, for East and Central, the prices are binomial.
HDBDATA %>%
grouped_gghistostats(
x = resale_price,
test.value = 50,
type = "nonparametric",
grouping.var = region,
normal.curve = TRUE,
normal.curve.args = list(color = "red", size = 1),
ggtheme = ggthemes::theme_tufte(),
## modify the defaults from `{ggstatsplot}` for each plot
plotgrid.args = list(nrow = 2),
annotation.args = list(title = "Resale price by region")
)
We can see that there is an extremely high correlation between resale price and floor area.
scdata <- highlight_key(HDBDATA)
sc1 <- ggplot(data = scdata, aes(x = town, y = resale_price, colour = region)) + geom_point() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) + scale_y_continuous(breaks = c(200000,500000,1000000,150000))
sc2 <- ggplot(data = scdata, aes(x = town, y = price_per_sqm, colour = region)) + geom_point() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1), legend.position='none') +
scale_y_continuous(breaks = c(3000,6000,9000,12000,15000)) +
labs(title = "Price vs Price per sqm")
subplot(ggplotly(sc1), ggplotly(sc2))When compared, we can see that the distribution of price vs price/sqm is very different. Even though there is a huge jump in price (e.g Queenstown) where there appears to be an “exception”, the price/sqm does not show that jump. Thus, using price/sqm is a better comparision.
However, we do notice that when we look at the highest priced property in Clementi, we noticed that it is actually not the highest priced/sqm. This goes to show that even though area is a huge factor, there maybe other reasons which will still slightly affect the price of a property.
HDBDATA %>%
mutate(class = fct_reorder(town, price_per_sqm, .fun="mean")) %>%
ggplot(aes(y =reorder(town, price_per_sqm),
x = price_per_sqm, fill = region)) +
geom_boxplot() + stat_summary(fun.y=mean, geom = "point", colour="yellow") +
labs(x = "Price per sqm", y = "Town", title = "Price per sqm by town")Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun` argument instead.

The data generated is not surprising with Central region dominating the price. However, we can tell that Clementi is also a popular area despite being in the West. It is also interesting to note that for old estates like Bishan and Ang Mo Kio, the prices are of the extreme ends with many 1 off exceptions. This was also mentioned in the news for Bishan being sold at a million dollars1.
HDBDATA %>%
group_by(region) %>%
mutate(class = fct_reorder(region, price_per_sqm, .fun="mean")) %>%
ggplot(mapping = aes(y = flat_type, x = price_per_sqm)) +
# Make grouped boxplot
geom_boxplot(aes(fill = as.factor(region))) +
theme(legend.position = "top") +
# Adjust lables and add title
labs(title = "HDB resale prices in 2022 by region", y="Flat Type", x = "Price per sqm", fill = "flat_type")
Using this boxplot, we can tell that there is a huge price gap by location with Central far off from the rest of the regions for only 5 room and 4 room flats. For the rest of the regions, the rankings are rather similar with North-East being the next most expensive and the West being the cheapest.
For 3 room flats, there is not much price discrepancies.
`summarise()` has grouped output by 'town'. You can override using the
`.groups` argument.
heatmap <- ggplot(data = floorheatmap,
mapping = aes(x = town, y = storey_range, fill = median_price)) +
geom_tile() +
labs(title = "Heatmap of HDB breakdown by area and storey", x = "Town", y = "Storey") +
scale_fill_gradient(name = "Median Resale Price/sqm",
low = "peachpuff",
high = "deeppink4")+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
heatmap
With this heatmap, we can confirm that the higher the storey leve, the higher the price. Jurong West and Geylang are the only exceptions where there is property which are cheaper for a lower level. These could be exceptions where other factors such as number of years on remaining lease comes into play. Referencing to the graph in 3.1, the property in Geylang and Jurong have varied aged property.
a <-
ggplot(HDBDATA, aes(x = rmlease, y = resale_price,
size = floor_area_sqm,
colour = region)) +
geom_point(alpha = 0.7,
show.legend = FALSE) +
scale_size(range = c(2, 12)) +
labs(title = '2022: {as.integer(frame_time)} Month',
x = 'Remaining Lease',
y = 'Resale Price') +
transition_time(month) + #<<
ease_aes('linear') #<<
a
From this animated graph over the cross of 12 months, we can tell that there is no hugh fluctuations in property prices by the price for each region is relatively stable with the exception of Central whose prices should be driven other factors.
Legend:
Size of dot: Size of Property
Colour: by Region (Consistent with graphs above)
https://www.straitstimes.com/singapore/housing/five-room-hdb-dbss-flat-in-bishan-sold-for-record-1295-million-three-weeks-after↩︎